174 research outputs found
Individual Privacy Accounting with Gaussian Differential Privacy
Individual privacy accounting enables bounding differential privacy (DP) loss
individually for each participant involved in the analysis. This can be
informative as often the individual privacy losses are considerably smaller
than those indicated by the DP bounds that are based on considering worst-case
bounds at each data access. In order to account for the individual privacy
losses in a principled manner, we need a privacy accountant for adaptive
compositions of randomised mechanisms, where the loss incurred at a given data
access is allowed to be smaller than the worst-case loss. This kind of analysis
has been carried out for the R\'enyi differential privacy (RDP) by Feldman and
Zrnic (2021), however not yet for the so-called optimal privacy accountants. We
make first steps in this direction by providing a careful analysis using the
Gaussian differential privacy which gives optimal bounds for the Gaussian
mechanism, one of the most versatile DP mechanisms. This approach is based on
determining a certain supermartingale for the hockey-stick divergence and on
extending the R\'enyi divergence-based fully adaptive composition results by
Feldman and Zrnic (2021). We also consider measuring the individual
-privacy losses using the so-called privacy loss
distributions. With the help of the Blackwell theorem, we can then make use of
the RDP analysis to construct an approximative individual
-accountant.Comment: 27 pages, 10 figure
GPrank : an R package for detecting dynamic elements from genome-wide time series
Background: Genome-wide high-throughput sequencing (HIS) time series experiments are a powerful tool for monitoring various genomic elements over time. They can be used to monitor, for example, gene or transcript expression with RNA sequencing (RNA-seq), DNA methylation levels with bisulfite sequencing (BS-seq), or abundances of genetic variants in populations with pooled sequencing (Pool-seq). However, because of high experimental costs, the time series data sets often consist of a very limited number of time points with very few or no biological replicates, posing challenges in the data analysis. Results: Here we present the GPrank R package for modelling genome-wide time series by incorporating variance information obtained during pre-processing of the HIS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth. GPrank is well-suited for analysing both short and irregularly sampled time series. It is based on modelling each time series by two Gaussian process (GP) models, namely, time-dependent and time-independent GP models, and comparing the evidence provided by data under two models by computing their Bayes factor (BF). Genomic elements are then ranked by their BFs, and temporally most dynamic elements can be identified. Conclusions: Incorporating the variance information helps GPrank avoid false positives without compromising computational efficiency. Fitted models can be easily further explored in a browser. Detection and visualisation of temporally most active dynamic elements in the genome can provide a good starting point for further downstream analyses for increasing our understanding of the studied processes.Peer reviewe
- …